多武装匪徒(MAB)在各种设置中进行广泛研究,其中目标是\ Texit {Maximize}随着时间的推移{Maximize}的措施(即,奖励)。由于安全在许多现实世界问题中至关重要,因此MAB算法的安全版本也获得了相当大的兴趣。在这项工作中,我们通过\ Texit {线性随机炸药杆}的镜头来解决不同的关键任务,其中目的是将动作靠近目标级别的结果,同时尊重\ Texit {双面}安全约束,我们调用\ textit {lecoling}。这种任务在许多域中普遍存在。例如,许多医疗保健问题要求在范围内保持生理变量,并且优选地接近目标水平。我们客观的激进变化需要一种新的采购策略,它是MAB算法的核心。我们提出Sale-LTS:通过线性汤普森采样算法进行安全调整,采用新的采集策略来适应我们的任务,并表明它达到了同一时间和维度依赖的索姆林的遗憾,因为以前的经典奖励最大化问题缺乏任何安全约束。我们通过彻底的实验展示并讨论了我们的算法的经验性能。
translated by 谷歌翻译
在许多真实世界应用程序的组合匪徒如内容缓存,必须在满足最小服务要求的同时最大化奖励。此外,基本ARM可用性随着时间的推移而变化,并且采取的行动需要适应奖励最大化的情况。我们提出了一个名为Contexal Combinatial Volatile Birtits的新的强盗模型,具有组阈值来解决这些挑战。我们的模型通过考虑超级臂作为基础臂组的子集来归档组合匪徒。我们寻求最大化超级手臂奖励,同时满足构成超级臂的所有基座组的阈值。为此,我们定义了一个新的遗憾遗嘱,使超级臂奖励最大化与团体奖励满意度合并。为了便于学习,我们假设基臂的平均结果是由上下文索引的高斯过程的样本,并且预期的奖励是Lipschitz在预期的基础臂结果中连续。我们提出了一种算法,称为阈值组合高斯工艺的上置信度界限(TCGP-UCB),最大化累积奖励和满足组奖励阈值之间的余额,并证明它会导致$ \ tilde {o}(k \ sqrt {t \ overline { \ gamma} _ {t}})$后悔具有高概率,其中$ \ overline {\ gamma} _ {t} $是与第一个$ t $轮中出现的基本arm上下文相关联的最大信息增益$ k $是所有在所有轮匝上任何可行行动的超级臂基数。我们在实验中展示了我们的算法累积了与最先进的组合强盗算法相当的奖励,同时采摘群体满足其阈值的动作。
translated by 谷歌翻译
寻找最佳个性化的治疗方案被认为是最具挑战性的精确药物问题之一。各种患者特征会影响对治疗的反应,因此,没有一种尺寸适合 - 所有方案。此外,甚至在治疗过程中均不服用单一不安全剂量可能对患者的健康产生灾难性后果。因此,个性化治疗模型必须确保患者{\ EM安全} {\ EM有效}优化疗程。在这项工作中,我们研究了一种普遍的和基本的医学问题,其中治疗旨在在范围内保持生理变量,优选接近目标水平。这样的任务也与其他域中相关。我们提出ESCADA,这是一个用于这个问题结构的通用算法,在确保患者安全的同时制作个性化和背景感知最佳剂量推荐。我们在Escada的遗憾中获得了高概率的上限以及安全保证。最后,我们对1型糖尿病疾病的{\ em推注胰岛素剂量}分配问题进行了广泛的模拟,并比较ESCADA对汤普森采样,规则的剂量分配者和临床医生的表现。
translated by 谷歌翻译
我们引入了随机匪徒反馈的矢量优化问题,这将最佳的手臂识别问题扩展到了矢量值奖励。我们考虑具有多维平均奖励向量的$ K $设计,根据多面体订购锥$ C $部分订购。这概括了多目标优化中的帕累托集合的概念,并允许通过$ c $编码的不同偏好。与先前的工作不同,我们根据无方向覆盖和间隙概念来定义帕累托集的近似值。我们研究($ \ epsilon,\ delta $) - PAC PACPARETO设定的识别问题,其中对每个设计的评估都会产生对平均奖励向量的嘈杂观察。为了表征学习帕累托集的困难,我们介绍了{\ em排序复杂性}的概念,即经验奖励向量偏离其平均值的几何条件,可以准确地近似帕累托前沿。我们展示了如何计算任何多面体排序锥的订购复杂性。我们在样品复杂性上提供了依赖性依赖性和最差的下限,并表明在最差的情况下,样品复杂性尺度具有订购复杂性的平方。此外,我们研究了Na \“ Ive Upination算法的样本复杂性,并证明它几乎与最坏的样本复杂性相匹配。最后,我们进行了实验以验证我们的理论结果并说明$ C $和采样预算如何影响Pareto设置,返回($ \ epsilon,\ delta $) - PAC PARETO SET和标识的成功。
translated by 谷歌翻译
我们考虑优化从高斯过程(GP)采样的矢量值的目标函数$ \ boldsymbol {f} $ sampled的问题,其索引集是良好的,紧凑的度量空间$({\ cal x},d)$设计。我们假设$ \ boldsymbol {f} $之前未知,并且在Design $ x $的$ \ \ boldsymbol {f} $ x $导致$ \ boldsymbol {f}(x)$。由于当$ {\ cal x} $很大的基数时,识别通过详尽搜索的帕累托最优设计是不可行的,因此我们提出了一种称为Adaptive $ \ Boldsymbol {\ epsilon} $ - PAL的算法,从而利用GP的平滑度-Ampled函数和$({\ cal x},d)$的结构快速学习。从本质上讲,Adaptive $ \ Boldsymbol {\ epsilon} $ - PAL采用基于树的自适应离散化技术,以识别$ \ Boldsymbol {\ epsilon} $ - 尽可能少的评估中的准确帕累托一组设计。我们在$ \ boldsymbol {\ epsilon} $ - 准确的Pareto Set识别上提供信息类型和度量尺寸类型界限。我们还在实验表明我们的算法在多个基准数据集上优于其他Pareto Set识别方法。
translated by 谷歌翻译
Common disabilities like stroke and spinal cord injuries may cause loss of motor function in hands. They can be treated with robot assisted rehabilitation techniques, like continuously opening and closing the hand with help of a robot, in a cheaper, and less time consuming manner than traditional methods. Hand exoskeletons are developed to assist rehabilitation, but their bulky nature brings with it certain challenges. As soft robots use elastomeric and fabric elements rather than heavy links, and operate with pneumatic, hydraulic or tendon based rather than traditional rotary or linear motors, soft hand exoskeletons are deemed a better option in relation to rehabilitation.
translated by 谷歌翻译
Smart retail stores are becoming the fact of our lives. Several computer vision and sensor based systems are working together to achieve such a complex and automated operation. Besides, the retail sector already has several open and challenging problems which can be solved with the help of pattern recognition and computer vision methods. One important problem to be tackled is the planogram compliance control. In this study, we propose a novel method to solve it. The proposed method is based on object detection, planogram compliance control, and focused and iterative search steps. The object detection step is formed by local feature extraction and implicit shape model formation. The planogram compliance control step is formed by sequence alignment via the modified Needleman-Wunsch algorithm. The focused and iterative search step aims to improve the performance of the object detection and planogram compliance control steps. We tested all three steps on two different datasets. Based on these tests, we summarize the key findings as well as strengths and weaknesses of the proposed method.
translated by 谷歌翻译
We demonstrate transfer learning-assisted neural network models for optical matrix multipliers with scarce measurement data. Our approach uses <10\% of experimental data needed for best performance and outperforms analytical models for a Mach-Zehnder interferometer mesh.
translated by 谷歌翻译
We propose a method for in-hand 3D scanning of an unknown object from a sequence of color images. We cast the problem as reconstructing the object surface from un-posed multi-view images and rely on a neural implicit surface representation that captures both the geometry and the appearance of the object. By contrast with most NeRF-based methods, we do not assume that the camera-object relative poses are known and instead simultaneously optimize both the object shape and the pose trajectory. As global optimization over all the shape and pose parameters is prone to fail without coarse-level initialization of the poses, we propose an incremental approach which starts by splitting the sequence into carefully selected overlapping segments within which the optimization is likely to succeed. We incrementally reconstruct the object shape and track the object poses independently within each segment, and later merge all the segments by aligning poses estimated at the overlapping frames. Finally, we perform a global optimization over all the aligned segments to achieve full reconstruction. We experimentally show that the proposed method is able to reconstruct the shape and color of both textured and challenging texture-less objects, outperforms classical methods that rely only on appearance features, and its performance is close to recent methods that assume known camera poses.
translated by 谷歌翻译
We investigate the problem of risk averse robot path planning using the deep reinforcement learning and distributionally robust optimization perspectives. Our problem formulation involves modelling the robot as a stochastic linear dynamical system, assuming that a collection of process noise samples is available. We cast the risk averse motion planning problem as a Markov decision process and propose a continuous reward function design that explicitly takes into account the risk of collision with obstacles while encouraging the robot's motion towards the goal. We learn the risk-averse robot control actions through Lipschitz approximated Wasserstein distributionally robust deep Q-learning to hedge against the noise uncertainty. The learned control actions result in a safe and risk averse trajectory from the source to the goal, avoiding all the obstacles. Various supporting numerical simulations are presented to demonstrate our proposed approach.
translated by 谷歌翻译